6. Data Management Considerations
Methods presented in Section 5: Statistical Tests and Methods vary in complexity from relatively straightforward graphical methods to complex matrix-based procedures, such as krigingA weighted moving-average technique to interpolate the data distribution by calculating an area mean at nodes of a grid (Gilbert 1987).. Correspondingly, the software packages listed in Appendix D rangeThe difference between the largest value and smallest value in a dataset (NIST/SEMATECH 2012). in capabilities from specialized spreadsheet-based groundwater calculators to comprehensive high-powered statistical software suites that are not industry specific. Most of these packages will accept input data from spreadsheets or text files, and many commercial packages are able to connect directly to user databases. Regardless of the system used, input data files should always be provided with statistical analysis deliverables (in electronic format) to allow for verification and cross-checking with different models, as appropriate.
Data management strategies will vary depending on the amount and type of data collected using a systematic planning process, as presented in Section 3. For example, a small dry cleaner site may conduct trend analysis on source or boundary wells or both to evaluate concentration changes over time for post-injection monitoring of an in situ bioremediation remedy. Tracking groundwater monitoring data using spreadsheet software may be sufficient for a project of this nature.
For large, complex, multi-source CERCLAComprehensive Environmental Response, Compensation, and Liability Act sites where there are numerous contaminants and separate monitoring systems a more sophisticated statistical approach may be warranted. With a large data set, preparing groundwater data for statistical analysis can be more time consuming than performing the analysis itself. For these sites, it can be more cost-effect to invest in a more robust data management solution. Commercial environmental data management software is available for this purpose. Comprehensive enterprise-level products developed under direction of the Department of Defense include:
- "The Environmental Restoration Information System (ERIS) is a Web-based database system for the storage of Army environmental restoration and range field data. It serves as a central repository for the Army installation chemical, geological, and geographical data." (US Army 2013a)
- "Environmental Resources Program Information Management System (ERPIMS) is the Air Force system for validation and management of data from environmental projects at all Air Force bases. These data contain analytical chemistry samples, tests, and results, as well as, hydrogeological information, site/location descriptions, and monitoring well characteristics." (USAF 2013).
- "Navy Installation Restoration Information Solution (NIRIS) is a web-based system that manages the Navy's environmental data, documents and records related to cleanup of hazardous waste sites. NIRIS provides the Navy’s remedial project managers (RPMs) and other environmental professionals with tools to effectively analyze, visualize, and present analytical and spatial data." (US Navy 2013b).
Most labs now deliver analytical results electronically and several state and federal organizations have established specific electronic data deliverable (EDD) format requirements. USEPA has developed the staged electronic data deliverable (SEDD) format to support uniform delivery, review, storage, and retrieval of laboratory data. (USEPA 2011a)
However, site data management may be complicated by turnover in site project managers and regulators over the life of a remediation project. Historical data may only be available as hard copy tables, presentation-level crosstab spreadsheets, or in other formats. Cleanup and conversion of legacy data can be very time and labor-intensive, so users must balance level of effort needed to convert data to usable form with the value of data to the statistical approach. See Section 3.3.2: Historical Data for additional discussion on usefulness of historical data for statistical evaluation. If the data set is small, it may be fastest to hand-enter data needed for analysis. Information regarding methods for automated data conversion and cleanup, such as scanning and optical character recognition, are available online
Good Practices for Managing Groundwater Monitoring Data
The general “good practices” listed below will help streamline data analysis and provide a basic structure listing for well construction, analytical results, field data, and geographical coordinates. This structure can be expanded upon as additional data needs are identified. The information presented here is intended as a starting point, you should determine the database formats and information requirements for each project. For more comprehensive data systems, users should follow established data standards such as USEPA’s SEDDstaged electronic data deliverable referenced above.
-
Provide well construction data for each well/monitoring interval in a single row for each well/screen interval:
Well Number
Well Diameter
Total Depth
Top Of Screen
Length Of Screen
Top Of Casing Elevation
Reference Datum
- Total depth measurements are typically entered as a positive depth value qualified as below ground surface or BGS.
- For sites with complex geology, parameters such as depth to first water after drilling, depth of drilling fluid circulation loss or other relevant measurements may also be tracked.
-
Provide analytical results and groundwater elevations in a “flattened” format, in which each row of data contains data collected from a single well/screen interval for a single contaminant on a single date. Tabulated analytical results should also include lab analysis qualifiers (such as I, J, and U), practical quantitation limits, and method detection limits to allow flexibility in identifying and managing nondetect values and potential outliers.
Well Number
Sample Date
Contaminant A
Concentration A
Lab Qualifier A
Pql A
Mdl A
Preparation Method
Analytical Method
- Contaminant listings should include a field for Chemical Abstract Service Registry Number (CASRN) or other standardized designation since many chemicals may be identified under multiple names. For example, tetrachloroethene is also known as perchloroethene, perchloroethylene, Perc, and PCE.
- All numeric results should be formatted to a predetermined precision (number of decimals). For most contaminants whole numbers are adequate, however there are a few where the nth decimal place is the difference between leave and remediate. If this is not set before data collection, columns could be incorrectly formatted and values set to "0" by accident.
- Present analytical measurements in three columns:
- One column is for the quantified value for that sample or a reporting limit if the sample is nondetect
- Another column is for a (possibly numeric, such as 1 for detected and 0 for nondetect) flag signifying the status of that sample (such as detected, trace, nondetect). Standardized lab qualifiers also serve this purpose and can be stored in this column.
A third column is for the units of the measurement consistent risk assessment or criteria such as μg/L or mg/L. Use of “parts per million” is not acceptable for groundwater evaluations.
Result Status Units 5
0
mg/l
Use this format rather than, for example, “<5”, “5J”, or a similar notation in the result column because most software will not function properly when numeric values are combined with text or symbols in the same column.
4. Tabulate field sampling results by single well/screen interval and date to verify that samples come from the same target population.
| Well Number | Sample Date | Static Depth to Water | pH | Temperature | Conductivity | Dissolved Oxygen | Turbidity | ORP |
|---|---|---|---|---|---|---|---|---|
5. While geospatial analysis is beyond the scope of this guidance, consistently collect and manage geographical coordinates and well survey elevations to simplify groundwater data analysis.
| Well Number | Sample Date | Latitude (Decimal Degree Or Degree, Minute, Second) | Longitude (Decimal Degree Or Degree, Minute, Second) | Collection Method | Datum | Verification Method |
|---|---|---|---|---|---|---|
6. Provide source references (such as lab reports, field notes) for all data stored in the system to verify integrity.
7. Always back up your data.
Publication Date: December 2013